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Early risk prediction and appropriate treatment are believed to be able to 
delay the occurrence of hypertension and attendant conditions. Many 
hypertension prediction models have been developed across the world, but 
they cannot be generalized directly to all populations, including for 
Indonesian population. This study aimed to develop and validate a 
hypertension risk-prediction model using machine learning (ML). The 
modifiable risk factors are used as the predictor, while the target variable on 
the algorithm is hypertension status. This study compared several machine- 
learning algorithms such as decision tree, random forest, gradient boosting, 
and logistic regression to develop a hypertension prediction model. Several 
parameters, including the area under the receiver operator characteristic area 
under the curve (AUC), classification accuracy (CA), Fl score, precision, 
and recall were used to evaluate the models. Most of the predictors used in 
this study were significantly correlated with hypertension. Logistic 
regression algorithm showed better parameter values, with AUC 0.829, CA 
89.6%, recall 0.896, precision 0.878, and F1 score 0.877. ML offers the 
ability to develop a quick prediction model for hypertension screening using 
non-invasive factors. From this study, we estimate that 89.6% of people with 
elevated blood pressure obtained on home blood pressure measurement will 
show clinical hypertension. 
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1. INTRODUCTION 


Hypertension is a serious disorder that can lead to a variety of life-threatening conditions, including 
cardiovascular disease [1], [2]. It is thought to contribute to 13 to 19% all deaths worldwide each year, [1], 
[3]-[5] and it is projected about 1.56 billion people will experience hypertension by 2025 [6]. Around 22% of 
the world’s population aged 18 years or older have elevated blood pressure. In Indonesia, approximately 34% 
of the population aged 15 years or older have high blood pressure, higher than the world average, but only 
8.8% person who have elevated blood pressure are aware that they have elevated blood pressure [7]. 
Uncontrolled blood pressure conditions increase risks such as coronary heart disease, heart failure, stroke, 
myocardial infarction, atrial fibrillation, peripheral artery disease, chronic kidney disease, and cognitive 
impairment. It is known that hypertension is a significant contributor to deaths caused by these diseases [8]. 
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Hypertension is the leading risk factor for cardiovascular disease, which is a modifiable risk factor. 
However, as a medical condition, high blood pressure is influenced by several factors, both demographic and 
lifestyle factors [9]. Risk factors such as age, sex, family history, smoking habit, alcohol consumption, body 
mass index, waist circumference, hip circumference, and waist-hip ratio are among the most practical and 
cost-effective measures for predicting cardiovascular risk as well as hypertension [9]-[11]. Prevention of 
hypertension and its complications has long been a subject in the public health domain. Population-based 
approaches to reducing risk factor levels through lifestyle modifications are getting increasing attention in 
preventing, detecting, evaluating, and treating high blood pressure. Early identification of risks and 
classification of blood pressure conditions is essential for controlling hypertension [12]. Early identification 
of blood pressure levels makes it possible to classify the blood pressure condition, whether normal, 
prehypertension, or hypertension. The classification demonstrates the progressive nature of hypertension and 
highlight the possibility of early detection of prehypertension and advanced hypertension [13]. 

Prediction of the risk hypertension is expected to improve decision making [14]. The use of 
predictive models for hypertension, either in routine care or at the community level, has many potential 
benefits, including adjusting the medication and intensity of prevention strategies in high-risk populations. 
Risk prediction of hypertension is used to identify individuals at high risk of hypertension and then take 
preventive strategies to delay or prevent the onset of hypertension so that health complications related to 
hypertension can be controlled. In fact, many hypertension risk prediction models have been constructed but 
could not be generalized to all populations [15]. Most predictive models have been developed in developed 
countries, and only a few have been based on populations in developing or less developed countries [9]. 
Differences between populations in terms of risk factors and characteristic will affect the result of risk 
prediction [16], [17]. Hence, it is necessary to construct a risk-prediction model specific to the Indonesian 
population. Also, many prediction models that have been developed still use a traditional statistical approach. 

ML approaches to predict and classify health outcomes are increasingly used in the health sector. 
ML as a part of artificial intelligence (AI) is gaining immense attention in the management of chronic disease 
and is considered a promising alternative to traditional methods for clinical predictions [11], [18], [19]. 
Therefore, developing a hypertension prediction model using a ML approach is necessary. Identifying and 
concentrating on people who are at high risk is one effective preventative strategy [12]. This study aims to 
develop and validate a hypertension risk-prediction model using a machine-learning algorithm for the 
Indonesian population. So, it is necessary to build prediction models that can assist in diagnosing 
hypertension. The combination of many methods to detect hypertension may be of great use either for 
clinical or communities, particularly on the Indonesia population [20]. 


2. METHOD 

This is a cross-sectional study using the fifth Indonesia life family survey (IFLS5) conducted in 
2014/2015. The IFLS is a longitudinal panel survey begun in 1993. This survey collects extensive 
information at the individual, household, and community levels on the socio-demographic factors and health 
and measures vital health information, including blood pressure [21]. The data used in this study were 
individuals aged 15 years and over who had their blood pressure measured. 


2.1. The features 

The predictors or features used in this study include socio-demographic factors (age, sex, 
employment status, and education), [9], [14] body mass index, [22] lifestyle factors (tobacco use and physical 
activity), history of chronic diseases (diabetes and/or high cholesterol), blood pressure, and acute morbidity 
symptoms (headache). The average of three measurements of systolic blood pressure (SBP) and diastolic 
blood pressure (DBP) were used to indicate blood pressure condition. Blood pressure was recorded using an 
Omron meter by trained interviewers at home with the respondent in a seated position [23]. Because this 
measurement falls in the out-of-office measurement or home blood pressure measurement (HBPM) category, 
we identified respondents has having elevated blood pressure if SBP > 135 and/or DBP => 85. [24], [25] 
physical activity in IFLS5 was assessed using the modified international physical activity questionnaire [23]. 
For further analysis, we categorized physical activity into two groups: insufficient and sufficient. The 
outcome variable was diagnosis with hypertension (systolic blood pressure =140 and/or diastolic blood 
pressure >90) of a person aged 15 years old or older by a health worker or a person who routinely takes 
antihypertension medication. 


2.2. Model development and evaluation 

Before developing the hypertension prediction model, we conducted a univariate correlation to 
explore the characteristics of the data and identified correlations between predictors and the target variable. 
Using the orange data mining application, several machine-learning models were compared, namely, decision 
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tree, random forest, gradient boosting, and logistic regression. We divided the data randomly into training 
(75%) and testing (25%) data, using data sampler tools in the orange data mining application. 

To evaluate the models, we used ten-fold cross-validation. This approach splits the dataset into ten 
equal groups at random, each with a comparable proportion of hypertensive people. Each subset is used by 
the orange software as a test dataset interchangeably; the remaining data are used to train the models. Several 
parameters, such as the area under the curve (AUC), classification accuracy (CA), precision (rate of true 
positives among data classified as positive), recall/sensitivity (rate of correctly predicted positive 
observations to the total observations in the actual class), and Fl score (weighted average of precision and 
recall) were used to compare the models. 


3. RESULTS AND DISCUSSION 
3.1. Subject characteristics 

From 48,139 individuals recorded in IFLS5, 32,804 people aged 15 years or over were eligible for 
this study. After data pre-processing, 30,320 individual data were suitable for further analysis and use in 
developing a prediction. In all, 3,637 individuals (12%) were diagnosed with hypertension. Using blood 
pressure measurements at data collection, 9,992 (32.96%) of respondents were identified as having high 
blood pressure (SBP > 135 and DBP => 85) from the HBPM guideline, of whom 26% were clinically 
diagnosed. In bivariate analyses, all predictors in this study showed a strong association with hypertension, 
except for physical activity (p=0.730). Table 1 showed us the characteristic of study participants and 
associate with hypertension condition. 


Table 1. Characteristics of study participant 


Variables Hypertension Normotensive p-value 
3637 (12.0) 26683 (88.0) 
Age <0.001 
Mean 47.92 35.89 
Std. Deviation 14.48 14.34 
Sex <0.001 
male 1323 (9.3%) 12943 (90.70%) 
Female 2314 (14.4%) 13740 (85.6%) 
Education <0.001 
Low 2328 (14.3%) 13986 (85.7%) 
High 1309 (9.3%) 12697 (90.7%) 
Employment <0.001 
Working 1983 (11.2%) 15717 (88.8%) 
Not working 1654 (13.1%) 10966 (86.9%) 
Tobacco use <0.001 
Ye 1078 (9.9%) 9837 (90.1%) 
No = 2559 (13.2%) 16846 (86.8%) 
Diabetes <0.001 
Yes 286 (45.3%) 345 (54.7%) 
No 3351 (11.3%) 26338 (88.7%) 
Cholesterol <0.001 
Yes 526 (42.8%) 704 (57.2%) 
No 3111 (10.7%) 25979 (89.3%) 
Headache* <0.001 
Yes 2631 (14.2%) 15833 (85.8%) 
No 1006 (8.5%) 10850 (91.5%) 
Elevated blood pressure** <0.001 
Yes 2600 (26.0%) 7392 (74.0%) 
No 1037 (5.1%) 19291 (94.9%) 
Physical activity 0.735 
Low 1411(11.9%) 10434 (88.1%) 
Enough 2226 (12.0%) 16249 (88.0%) 
Body mass index <0.01 
Mean 25.21 22.99 
Std Deviation 4.77 4.36 
Systolic (mean) 149.18 mmHg 124.38 mmHg <0.01 
Diastolic (mean) 88.85 mmHg 77.26 mmHg <0.01 


*Headache in last four weeks 


**SBP > 135 and/or DBP = 85 mmHg (out-of-office measurement guideline) 


Age and sex are important determinants of hypertension, and this study was no exception to this 
finding: the average age of hypertension sufferers was 47.9 years old, and the mean age of those without 


Int J Artif Intell, Vol. 12, No. 2, June 2023: 776-784 


Int J Artif Intell ISSN: 2252-8938 o 7719 


hypertension was around 35.8 years old. Hypertension was also more prevalent in females (14.4%) than 
males (9.3%). There were also differences in social determinants, such as education and employment status. 
Those with higher education had less hypertension (9.3%) than those with low education (14.3%). 
Hypertension was higher in non-tobacco users (13.2%) was higher than tobacco users (9.9%). We found that 
physical activity was not significantly different between people with hypertension and people without 
hypertension (p=0.735). Physical activity is considered one of the main factors that affect blood pressure in 
general, so we included it as a predictor in model development using machine-learning algorithms. 


3.2. Model prediction performance 

In developing models with a ML approach, there are several algorithms that can be used. we applied 
several algorithms to develop a hypertension prediction model using ML: decision tree, random forest, and 
logistic regression. The dataset that had previously been divided into testing data and validation data was 
analyzed using the four algorithms previously mentioned. Table 2 describes the parameter values used to 
assess the performance of the algorithms. 


Table 2. Comparison the performance of machine-learning model on hypertension prediction 


Model AUC CA Fl Precision Recall 
Decision Tree Prediction 0.891 0.952 0.949 0.950 0.952 
Evaluation 0.544 0.860 0.854 0.849 0.860 

Random Forest Prediction 0.993 0.958 0.955 0.958 0.958 
Evaluation 0.781 0.888 0.871 0.867 0.888 

Logistic Regression Prediction 0.828 0.894 0.873 0.875 0.894 
Evaluation 0.829 0.898 0.879 0.881 0.898 

Gradient Boosting Prediction 0.842 0.899 0.882 0.884 0.899 
Evaluation 0.821 0.893 0.874 0.874 0.893 


CA: classification accuracy, AUC: area under the curve, Fl: a weighted harmonic means of precision and recall, Precision: rate of true 
positives among data classified as positive, Recall: proportion of true positives among all positive instances in the data 


The machine-learning algorithm was shown to have good predictive values. Random forest and 
decision tree are algorithms showed better accuracy and precision than the others in the prediction results. 
However, after evaluating the algorithms in testing, it was found that logistic regression and gradient 
boosting resulted in better parameter values. We found that the logistic regression model had better parameter 
values than the others, with AUC 0.829, accuracy 0.898, recall (sensitivity) 0.896, precision 0.878, and Fl 
score (the weighted average of precision and recall) 0.877. 

The AUC value obtained from the logistic regression model was 0.829, indicating that the model 
could distinguish between the class’s pf hypertension and non-hypertension better. The AUC reflects how 
well the model recognized the distinction. The greater the AUC, the better. AUCs of 0.5 or above indicated 
that the classifier had a good probability of distinguishing hypertension from non-hypertension class values. 
The other intuitive performance indicator for machine-learning algorithms that was used was CA, which is 
assessed as the rate of correctly predicted observations to total observations. Our model received a score of 
0.898, indicating that it predicted hypertension with 89.8% accuracy. 

Figure 1 shows a comparison of receiver operating characteristic (ROC) curves for each algorithm 
used in this study. The ROC curve is a visualization that indicates the algorithm's performance to do 
classification. The closer the curve to the top-left corner, the better an algorithm performs the classification. 
The ROC curve for the logistic regression algorithm Figure 1(a), with an AUC value of 0.829, shows that the 
curve is closer to the top left corner. Gradient boosting dan random forest Figure 1(b) and Figure 1(c) have 
AUC values lower than logistic regression 0.821 and 0.781, respectively. Meanwhile, the algorithm whose 
curve is further away from the top-left corner is the decision tree Figure 1(d) algorithm with an AUC value of 
0.544 Thus, these ROC graphs show that logistic regression is the better classifier than other algorithms used 
in this study 

A machine-learning model, in this case a logistic regression algorithm, is not much different from 
the multivariate logistic regression analysis of the type used in other statistical software. The overall 
percentage in the classification table in the multivariate logistic regression test is 89.5%. Therefore, home 
blood pressure measurement (HBPM), after being controlling for other predictors, have a predictive ability to 
of 89.5% of diagnosing hypertension relative to diagnosis of a physician or health worker examination as a 
gold standard. 

High blood pressure is a major risk factor for morbidity and mortality worldwide, especially in 
cardiovascular diseases [26]. In general, hypertension risk factors can be grouped into two, namely 
modifiable and non-modifiable risk factors. Modifiable risk factors include diet, smoking behavior, [27], [28] 
alcohol consumption, [29], [30] stress level, [31] physical activity, [32]-[34] and body mass index. Risk 
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factors that cannot be modified include age, [35], [36] sex, parental history of hypertension, and other genetic 
factors [37]. 
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Figure 1. Receiver operating characteristic (ROC) curve for each algorithm: (a) logistic regression, 
(b) gradient boosting, (c) random forest, and (d) decision tree 


Early intervention both in terms of lifestyle modification and appropriate treatment in a condition of 
blood pressure is recognized to reduce the risk of hypertension [17]. Therefore, the ability to predict an 
individual’s risk of developing hypertension will be very helpful for health workers and for the community 
broadly speaking. Early identification of blood pressure condition will help health workers to plan and 
administer lifestyle modification recommendations or therapeutic interventions to prevent or delay the 
development of hypertension [9], [16], [38]. 

Its diagnostic accuracy and prognostic significance in predicting cardiovascular events give home 
blood pressure monitoring the potential to enhance hypertension control and make it a helpful addition to 
standard office blood pressure readings [39]. This study indicates that at least about 89.6% of people with 
elevated blood pressure based on HBPM have clinical hypertension, where SBP => 140 mmHg and/or DBP = 
85 mmHg. This is in line with the resutls reported by Jacob George, who foujnd that around 15-30% of home 
blood pressure measurements are not capable of determining the classification of blood pressure [25]. 

Four machine-learning algorithms were tasked with producing hypertension predictions based on 
non-invasive data collection. Age, sex, level of education, working status, tobacco usage, physical activity, 
body mass index, diabetes history, high cholesterol history, and home blood pressure measurement are all 
significant predictors of hypertension and were used. The algorithms demonstrated good prediction accuracy 
in general, with logistic regression doing better than decision tree, random forest, and gradient boosting 
algorithms in terms of discrimination ability. In a similar study, [38], [40] the XGBoost model produced 
better parameter values. XGBoost is a more regularized gradient boosting model. The parameter values in the 
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gradient boosting model that we used in this study were only slightly different from those in the logistic 
regression model. 

The application of ML is relatively new in public health study, a field of science that focuses on the 
construction and study of systems that can automatically learn from data to generate highly accurate 
predictive models [10]. Machine-learning predictive models can generate robust diagnostic parameters 
because they produce correct predictions from observed correlations [41]. Machine-learning models can 
identify which variable or group of variables is most useful for predicting hypertension [10]. Health research 
could benefit from using machine-learning techniques to verify combinations of variables that best predict 
particular outcome, which is hypertension in our case. 

The use of hypertension predictive models both in health facility or community has several benefits, 
including enabling adjustment the prescription/therapy and intensity of preventive solutions in those at high 
risk of developing hypertension, as well as improving shared decision making through accurate risk 
communication to people at high risk. Apart from its use in routine clinical situations, the prediction of 
hypertension risk scores can also be used to identify people at high risk for inclusion in hypertension and 
project the future burden of hypertension at the community level. In each of these applications, estimates of 
hypertension risk obtained from predictive models must be accurate and valid [9], [14]. 

Illness prediction, disease categorization, and medical image recognition methods are just a few of 
the many ML approaches that have been extensively used in medicine [42]. Hypertension prediction models 
using a machine-learning approach can produce a robust prediction model [10], [12]. On the other hand, 
traditional statistical approaches such as binary logistic regression or linear regression require several 
essential assumptions such as independence and multicollinearity, while ML does not take these assumptions 
into account. 


4. CONCLUSION 

Nowadays ML often employed in sophisticated data analysis and optimization techniques for many 
types of medical issues. Machine-learning models have been widely used for making predictions, especially 
in the health sector. Although much research has been conducted on hypertension, no one can claim that we 
have developed a universal human instrument to anticipate hypertension. However, many of the prediction 
models that have been developed are still underutilized, both in health care facilities and in the community. 
Researchers prefer to employ fewer components and overlook the impact of others since hypertension is so 
complicated and related with so many variables. The hypertension prediction model that we developed here 
estimates the probability of a person’s risk of hypertension based on blood pressure measurements taken at 
home. This hypertension prediction model could be used to assist decision making both at the clinical level or 
at the level of the health care facility and at the household or community level. Further development and 
translation of machine-learning algorithms into decision support system applications is very important. Use 
of this model is easy, based on simple predictors, and would not require invasive interaction with patients. 
From this study, we estimate that 89.6% of people with elevated blood pressure obtained through home blood 
pressure measurement will show clinical hypertension. 


LIMITATIONS 

Our findings were based on a single cross-sectional study to predict hypertension. These predictors 
have a restricted range of use, and their value may change over time. The study data could not reflect the 
entirety of the population of Indonesia. Longitudinal data are needed to predict the risk of new-onset 
hypertension and produce a better prediction model. 
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